1 General information

1.1 Welcome Email

Dear students,

A warm welcome to the module Data skills for social work professionals!

As the first Monday of our class falls away (Pfingsten), we would like you to complete a few preparation tasks before the first meeting on Tuesday 21st.

  1. Enroll on the moodle page (https://moodle.bfh.ch/course/view.php?id=37097) with the following key: FS24-bsc.

  2. It is essential that you have R and R-Studio installed and running on your computer before the first classroom session. Please follow the instructions in the “Installation of R and R-Studio” guide (https://drive.switch.ch/index.php/s/ktNsnWxwkJ3olWG ), and if necessary, refer to the linked instructions on YouTube. If you have any questions, please feel free to contact us via email.

  3. Familiarize yourself with R. We want you to take the opportunity of new AI tools and ask Copilot to take you through a tour in R (https://www.bing.com/chat?q=Microsoft+Copilot&FORM=hpcodx). Instruct Copilot on its task with the text below.

  4. Finally, we invite you to familiarize yourself with the topic of “Data Science” and its application in social work. Create a forum post https://moodle.bfh.ch/mod/forum/view.php?id=2165224 , where you provide a concrete example of how data science can help improve the effectiveness of social work or promote the well-being of clients. What are the potential benefits and challenges of applying data science in this field? We look forward to reading your perspectives and ideas on this topic.

We wish you a successful preparation period and look forward to meeting you in person soon. Please let us know should you have any questions.

Kind regards

Dorian Kessler, Samin Sepahniya


Text to enter into Co-Pilot ein (Microsoft Copilot in Bing; important: verwenden Sie den Unterhaltungsstil «im höheren Masse kreativ/creative mode» (Schaltfläche in der Mitte des Bildschirms)):

Als Studierende(r) der Sozialen Arbeit möchte ich die Grundlagen der Programmiersprache R lernen, um statistische Datenanalysen für Projekte in der Sozialen Arbeit durchführen zu können. Ich habe keine Vorkenntnisse in Statistik oder Programmierung. Kannst du mir bitte eine schrittweise Einführung geben? Bitte beginne mit der Frage ob ich R und Rstudio installiert habe und wenn nein, unterstütze mich bei der Installation von R und RStudio. Zeige mir dann die grundlegenden Befehle und Funktionen von R. Ich würde ich gerne lernen, wie man einfache Datenanalysen durchführt (mit dplyr), Daten visualisiert (mit ggplot2) und Ergebnisse interpretiert. Folgende Dinge sind zu beachten:

  • Wähle ein schrittweises Vorgehen. Erzähle mir erst von dem nächsten Schritt, wenn ein Schritt abgeschlossen ist. Frage nach jedem Schritt nach, ob ich diesen erfolgreich abschliessen konnte, um sicherzustellen, dass ich alles richtig gemacht habe.

  • Sage mir als ersten Schritt genau wie ich mich visuell in RStudio orientieren kann und wo ich Eingaben machen muss. Wo befindet sich die Konsole/Skript/Datenübersicht/Dateienübersicht in RStudio?

  • Erkläre mir, was die Konsole ist und was ein R-Skript ist, wie man ein R-Skript erstellt und abspeichert und was der Zweck von Skripten ist. Arbeite mit mir mit einem R-Skript und sage mir, wie ich Befehle ausführen kann.

  • Bitte führe mich durch praktische Übungen und gebe mir Aufgaben, um das Gelernte zu festigen.

  • Biete mir Unterstützung bei Unklarheiten.

  • Arbeite mit Beispielen, welche für die Soziale Arbeit relevant sind. Erfinde relevante Daten aus den Bereichen Sozialhilfe oder Kindes- und Erwachsenenschutz.

  • Kommentiere den Code Zeile-für-Zeile detailliert aus, so dass ich ihn genau verstehe.

  • Biete mir am Schluss weitere Übungen an, falls ich Lust habe. Mache Vorschläge für Übungen.

  • Du bist eine R-Expert:in, weisst aber auch, dass angehende Sozialarbeiter:in in Sachen Programmierung wenig Wissen haben und das nicht technische Begriffe eine alltagssprachliche Erklärung benötigen.

  • Danke für deine motivierte Unterstützung und Hilfsbereitschaft! Du hilfst mir R zu lernen und dieses Wissen für Klient:innen einzusetzen.

  • Wichtige Details:

  • Bitte lasse das «print()» weg, falls nicht nötig.

  • Ergänze bei Strg jeweils Ctrl, falls gewisse Personen englische Windows Tastaturen haben.

2 General Introduction

2.1 Learning Goals

  • People learn basic data science tools.

  • People learn how to integrate data science in social work problem solving.

  • People learn how to do data science with R.

2.2 What is data science?

  • Term that emerged ca. 10 years ago. Predecessors: Statistics, Data analysis.

  • The science of creating valuable information from data

  • Practice-oriented science

  • Combines technical and field expertise

2.4 Datafication or why data science is becoming more important in the future

  • Data is the new oil.

  • Data contains information on human behavior = helps us better understand the human world and solve human problems.

  • In the era of AI, “data literacy” becomes a key skill in all areas of life, including social work –> it should be a basic competence

    • Data awareness

    • Skills to interpret and analyze data

2.5 Data sources that are relevant for social work

2.5.3 Found data

  • Data not explicitly generated for research
  • Always on
  • Numbers, text, images, audio, video
  • Data from
    • Online activity (digital communication etc.)
    • Smartphone usage (calling, filming, walking etc.)
    • Administrative registries
    • Payments
    • Smart devices
    • Video surveillance

  • Publicly owned individual data

  • Can be linked using social security numbers

The Swiss federation and cantons store data about all of life’s aspects
The Swiss federation and cantons store data about all of life’s aspects

2.6 Exercise

  • Think of a social work field

  • What is the goal of social work in that field: What aspects of your clients lives do you want to improve?

  • What existing data could you use to measure these aspects of your clients’ lives? Who owns the data? What specific information would you use to measure this? What are technical and ethical limitations?

  • Post your answers on this padlet

2.7 Kompetenznachweis

  • You will analyze one of the following data sets

  • Structure

    • Einleitung: Vorstellung der Fragestellung und ihrer Relevanz für die Soziale Arbeit

    • Methodik: Dokumentation dessen, welche Daten verwendet und wie sie ausgewertet wurden

    • Resultate: Präsentation der Resultate

    • Schlussteil: Diskussion und Interpretation der Resultate mit Bezug zum Gegenstand und Auftrag der Sozialen Arbeit

  • Die Studierenden liefern zudem ein R-Code File mit, in welchem die Aufbereitungs- und Auswertungsschritte festgehalten sind. Das Code-File muss reproduzierbar sein und die verwendeten Resultate herstellen.

  • Der Kompetenznachweis (Dokumentation, R-Code) wird in Gruppen von 2-3 Personen verfasst, verfügt jedoch über individuell verantwortete Teile im Text oder im Codefile (z.B. im Text: Einleitung, Methodik, Resultate, Schlussteil; im Code: Aufbereitung und Auswertung). Die individuellen Beiträge sind am Ende der Dokumentation als solche auszuweisen (Angabe der Kapitel; für Code: Angabe der Zeilennummern).

3 Introduction to R

3.1 General Information about R

  • R is free and open source.

  • R has excellent online documentation.

  • R has a very active user community (forums, blogs, etc.).

  • R is more than just statistical software.

  • R has interfaces to numerous other programs.

  • R is interdisciplinary.

  • R is gaining in importance and popularity! (see Popularity Statistics)

  • With RStudio, there is now a powerful tool for an easy and efficient workflow.

  • Advantages

    • Free of charge
    • Active community
    • New methods are implemented faster
    • More flexible/customizable: more than just statistical software
    • Everyone can contribute
  • Disadvantages

    • No centralized support
    • Somewhat more difficult to learn (uses command line inputs instead of pull-down menus)
    • Less consistency between procedures
    • Anyone can contribute
  • Which organizations are behind R?

  • RStudio Environment

    • Console Window
    • Source Editor (Syntax window)
    • File Window, Plot Window
    • Environment Window, History Window
  • First steps

    • R Studio Environment
    • Working Directory
    • R Base Package
    • Install & load packages
    • Simple calculations
    • Data import
    • Execute basic functions such as summary(), help()
    • Comment codes with #

3.2 Working Directory, Objects, and Workspace

  • Working Directory
    • getwd(): Displays the working directory.
    • setwd(): Defines a new working directory.
    • dir(): Displays the contents of the current working directory.
    • Do not use \ for path specifications, use / or \\ instead.
  • Objects
    • “Naming guidelines”: meaningful names; no spaces; start with a letter.
    • a <- 10 or a = 10: Creates or overwrites the object a with the content on the right (10).
    • a: Displays the content of the object a.
    • rm(): Deletes objects from the workspace.
    • save(a, b, file = "example.RData"): Saves the specified objects (a, b) in the current working directory.
    • load("example.RData"): Loads all objects saved in the specified file.
    • #: Starts a commented line that is not interpreted.
  • Workspace
    • Here are all the objects that have been worked with.
    • If the workspace was saved when closing (R asks for it), it will be loaded again when R starts.
    • ls(): Displays all objects in the current workspace.
    • The workspace is saved in the current working directory by default.

3.3 Example: Working Directory, Objects, and Workspace

# Comments start with #
# Everything in the line after # is ignored by R
5+5

getwd() # Display working directory
# Define working directory
# setwd("C:/some/path/") 

dir() # Display working directory

a <- 50 # Creates object a (number vector of length 1) with the single value 50
a

# With c() - concatenate you can also build a number vector with several elements:
b <- c(1, 2, 3, 4)

# or shorter
b <- seq(1,4)

# or even shorter
b <- 1:4

# Create object containing the first names of the Beatles
the.beatles <- c("John", "Paul", "George", "Ringo") 
the.beatles # compared to a, the object is now a string/character


# Object names must not have spaces. It is also recommended - and _ should be avoided (Google R Style Guide)
# Names should be meaningful. Naming should be consistent throughout the code file (dots, upper/lower case)

ls() # Display workspace
rm(die.beatles) # Delete object

3.4 Packages

  • Many objects (functions, data records, etc.) are located in so-called packages (see here)

  • Packages are written and maintained by countless voluntary authors. As a result, numerous methodological niches are well covered (especially in comparison to other statistical software packages)

  • Some packages are part of the core scope of R and are loaded by default when R is started. Other packages can be loaded on request.

  • Warning: Objects contained in packages (e.g. functions) may overlap by name (“function is masked”)

  • Functions in connection with packages:

    • library(packagename): load installed package
    • library(): display all installed packages
    • library(help=packagename): Some package information
    • search(): show currently loaded packages
    • detach("package:packagename"): “unload” package again
    • ls("package:packagename"): show all objects within a package
    • packagename::bar: load a single object from a package instead of the whole package
    • install.packages("packagename"): install package
    • remove.packages("packagename"): uninstall package

3.4.1 Example: Packages and help function

# Let's assume we want to read in an SPSS file:
??spss

# provides references to the functions read.spss and read_por on the packages foreign and haven 
# Install package
# install.packages("foreign") 

# Only works if the package is also loaded
library(foreign)
?read.spss

# But general recommendation: Google or AI now usually provide better results than the help function

3.4.2 Exercise: Install and load the following packages.

  • dplyr: for working with data sets / data cleansing
  • ggplot2: for data visualization
  • haven: facilitates the reading of SPSS, SAS and Stata files in R.
  • readxl: read Excel files (.xls and .xlsx) into R
  • officer: enables the creation and editing of Microsoft Word documents (.docx) directly in R

3.5 Arithmetic and logical operators

  • Arithmetic operators
    • + - * / ^
  • Comparison operators (logical operators)
    • & | == != > < >= <=
  • Mathematical functions (see the help ?function in each case)
    • exp(x)=e^x log(x) log10(x) sin(x) cos(x) tan(x)
    • abs(x) sqrt(x) ceiling(x) floor(x) trunc(x) round(x, digits=n)

3.5.1 Example: Calculating and comparing

# Calculate
result <- (23+24)*11/(18+15)*5
result

# Functions
log(2) 
cos(2)

# Comparison
x <- -3:3
x

# Are the elements of x equal to 0?
x == 0

# greater than 0?
x > 0

# less than 0?
x < 0

# greater than or equal to 0?
x >= 0

# less than or equal to 0?
x <= 0

# not equal to 0?
x != 0

# greater than -1 but less than 1
x > -1 & x < 1

# greater than 1 and less than -1
x > 1 & x < -1

# greater than 1 or less than -1
x > 1 | x < -1

3.5.2 Exercise: Calculating and comparing

  1. calculate the following terms in R
    1. \(((3 + 4 - 5) - 9)^{2}\)
    2. \(log(1)\)
  2. check the following comparisons:
    1. \(5 = 7\)
    2. \(\sqrt{3} \neq cos(17)\)

3.5.3 Solution: Calculating and comparing

# 1.
((3+4-5)-9)^2
log(1)

# 2.
5==7
sqrt(3)!=cos(17)

3.6 Class / data types

  • class(): Reveals the class of an object

  • numeric

  • logical (TRUE/FALSE)

  • Character/String

  • Some data types can be converted, e.g. as.numeric() or as.character()

  • List, e.g. list(1, "Hello", TRUE)

  • Data frame: “list” of vectors of the same length

  • Factors: represent categorical data. These are stored as numerical values but are linked to a value label

3.6.1 Overview of data types

# Integer vector
x <- c(1, 2, 3)
class(x)
x

# Logical vector
x <- -3:3
y <- x >= 0
y
class(y)

# String/character vector
x <- c("a", "b", "c")
class(x)
x

# list
list <- list(a= c(4:8), b = c("a", "b", "c"), c = c(TRUE, FALSE))
class(list)
list

# factors
sex <- c(0, 0, 1, 1)
factor(sex, labels=c("man", "woman"))

# Functions: e.g. cos(); mean()
class(mean)
mean

3.6.2 Special values

  • Inf and -Inf: Positive and negative infinite
  • NaN: “Not a number”, e.g. 0/0.
  • NA: missing value (Missing)
# Important note on missing values:
x <- c(1, 2, NA, 4)

#wrong:
x == NA
x == "NA"

#correct:
is.na(x)

3.6.3 Data Frames

  • Data frames are the typical format for data sets
  • …it is a list of vectors of the same length see Wickham
  • …resembles matrices but the columns can contain different data types
  • data.frame(): creates a data frame
  • as.data.frame(): converts to a data frame
  • order(): sorts data
  • summary() and str(): overview of data frames
  • head() and tail(): inspect first/last lines
  • names(): show column names
  • object$var1: directly accesses the column var1 in the data frame object
  • na.omit(): Row-by-row exclusion of missing values, i.e. rows that contain at least 1 missing value

3.6.3.1 Example: Data Frames

# Prepared data are often data frames.
richtungswechsel <- read.csv("S:/MA1082973/_FHNW/BA472/data sets/Richtungswechsel/Richtungswechsel_anonymized data set.csv")

class(richtungswechsel)

# you can also easily build one yourself
beruf <- c("Lehrerin", "Verkäufer", "Pilotin")
nation <- c("CH", "DE", "IT")
id <- 1:3
df <- data.frame(id, beruf, nation)
df

# Addressing rows and column positions
df$nation
df[, "nation"]
df[3, "beruf"]
df[3, 3]

3.6.4 Access to different data areas

# How can I access specific elements of a vector directly?
x <- seq(2, 200, 2)
x
x[1] # first element of x
x[1:10] # the first 10 elements of x

# For two-dimensional objects, both rows and columns can be accessed:
# load Richtungswechsel data
richtungswechsel <- read.csv("S:/MA1082973/_FHNW/BA472/data sets/Richtungswechsel/Richtungswechsel_anonymized data set.csv")

richtungswechsel[1:2, c(3, 6)] # reads: "first to second row, third and sixth column"

# Eselsbrücke: Zeilen zuerst, Spalten später.


# Apart from the position, the name of a column (or row) can also be used for referencing:
richtungswechsel[, c("Geschlecht", "Staatsang.", "europe")]

# or by a condition:
richtungswechsel[richtungswechsel$Bezugsdauer > 4, ]

3.6.5 Exercise

  1. read the data set Richtungswechsel into R.
  2. display the tenth and twelfth rows of the data set.
  3. display the columns `Bezugsdauer and Bildungsstand for all persons aged between 25 and 40.

3.6.6 Solution

# 1.
richtungswechsel <- read.csv("S:/MA1082973/_FHNW/BA472/data sets/Richtungswechsel/Richtungswechsel_anonymized data set.csv")

# 2.
richtungswechsel[c(10, 12), ]

# 3. 
richtungswechsel[richtungswechsel$Alter > 25 & richtungswechsel$Alter < 41, c("Bezugsdauer", "Bildungsstand")]

3.7 Data management

3.7.1 Tidy vs. Messy Data

3.7.1.1 General

  • some conventions for the clean presentation/storage of data (Hadley Wickham)
  • see http://vita.had.co.nz/papers/tidy-data.pdf and http://tidyverse.org/
  • Quintessence:
    • once data is clean (tidy), analysis tools (plotting, model fitting) can also work cleanly and without additional effort (e.g. ggplot2, lm/glm)
    • Cases (observation units) in rows, variables (observation dimensions) in columns
  • Data is messy if
    • Columns are not labeled
    • A column contains more than one variable
    • variables also appear in rows instead of columns
    • different observation units are in the same table
  • Tools to clean data (small selection):
    • dplyr / data.table
    • melt() dcast() from reshape2
    • str_replace(), str_sub() from the stringr package
    • tolower()
    • some more in package tidyr
    • also useful: recode() from John Fox (package car)

3.7.1.2 Example: Tidy vs Messy Data

# Wetterdaten
weather <- read.table("https://raw.githubusercontent.com/justmarkham/tidy-data/master/data/weather.txt", header=TRUE)
head(weather) # the variables are in rows and columns 

# reshape the data (melt) and delete mssings values
library(reshape2) # for melt()/dcast()
weather1 <- melt(weather, id=c("id", "year", "month", "element"), na.rm=TRUE)
head(weather1)

# clean column for "day"
library(stringr)    # for str_replace(), str_sub()
weather1$day <- as.integer(str_replace(weather1$variable, "d", ""))

# we do not need the "variable" column
weather1$variable <- NULL

# The element column contains two different variables tmin and tmax. 
# These should be in two columns:
weather1$element <- tolower(weather1$element) # lowercase letters
weather.tidy <- dcast(weather1, ... ~ element) # reshapen to two columns
head(weather.tidy)

# the date can also be displayed in a column as a real date:
weather.tidy$date <- as.Date(paste(weather.tidy$year, 
                                   weather.tidy$month, 
                                   weather.tidy$day, sep="-"))
weather.tidy[, c("year", "month", "day")] <- NULL
head(weather.tidy)

3.7.1.3 Exercise

  • Read the data set SHP into R. You can use this command: read_sav(“your workingdirectory/SHPLONG_P_USER.sav”) in haven package.

  • Look at the data structure and the variables of the data set. Also use functions such as: summary(dataset$var), head(dataset), names(dataset)

3.7.1.4 Solution

# load libraries & data
library(haven)
shp <- read_sav("S:/MA1082973/_FHNW/BA472/data sets/SHP/SHPLONG_P_USER.sav")

# data structure
head(shp)
names(shp)
summary(shp$AGE)
#Variant 2 to download the shp data: with this command you can download files if you have the direct link to the file
download.file("https://drive.switch.ch/index.php/s/02NutftoUqK4x9V/download", "shp2022.RData", mode = "wb")

#Because the file is in R-Data-Format (.RData) we can load it directly (adjust the working directory to the path where the data is stored)
load("S:/MA1082973/_FHNW/BA472/data sets/SHP/shp2022.RData")

4 Descriptive statistics

4.1 Contingency tables

  • table(x): one-dimensional contingency table
  • table(x,y): two-dimensional contingency table
  • prop.table(table(x)): relative frequency

4.1.1 Example: Contingency tables

# Let's define a new variable AGE that contains the age (in years) of ten people.
age <- c(76, 54, 38, 96, 32, 76, 81, 81, 50, 75)

# Let's take hosp to be a variable that contains the information if the same ten persons have been hospitalized in the last six months (1= yes, 0 = no)
hosp <- c(1, 0, 0, 0, 0, 1, 0, 0, 1, 0)

# Tabelle
table(age)
table(age, hosp)

# Tabelle in Prozent
100*prop.table(table(age))
100*prop.table(table(age, hosp))

4.2 Univariate statistics (1 variable)

  • mean(x): Mean
  • sd(x): Standard deviation
  • var(x): Variance
  • median(x): Median
  • min(x): minimum
  • max(x): Maximum
# Mean
mean(age)
sd(age)

# Median
median(age) 
sort(age)

# Funktion summary
summary(age)

4.3 Package dplyr

4.3.1 General

  • package dplyr by Hadley Wickham/Romain Francois offers a toolset for data preparation
  • See the dplyr vignette and the Data Wrangling Cheat Sheet for a very good overview
    • filter(): selects a subset of rows (see also slice())
    • arrange(): sorts
    • select(): selects columns
    • mutate(): creates new columns
    • summarize(): aggregates (collapses) data to individual data points
    • distinct(): removes duplicate values
    • group_by(): defines subgroups in the data so that mutate() and summarize() can be applied separately per group.
    • dplyr can be used very well together with so-called piping, i.e. the data object is passed from function to function by %>%, which makes the code much easier to read and more compact.

4.3.2 Example: Package dplyr

# load data
library(dplyr)
shp2022

# filter by 1st nationality not Switzerland and persons up to 65 years old  
# select columns
shp2022a <- shp2022 %>% 
  filter("NAT_1_" != 8100, AGE < 66) %>% 
  select(c("AGE", "SEX", "NAT_1_", "EDUCAT"))
  

# create new variable "Tertiary education"
shp2022a <- shp2022a %>%
  mutate(EduTertiary = EDUCAT == 10)

# Count by gender and with a tertiary education (TRUE/FALSE)
tab <- shp2022a %>%
  group_by(SEX, EduTertiary) %>%
  summarise(n=n()) %>%
  arrange(SEX, EduTertiary) %>%
  na.omit()

4.3.3 Exercise: Package dplyr

  • Read the SHP data into RStudio.

  • Restrict the data set for the year 2022 to people who are 25 years or older. Familiarize yourself a little with the data (e.g. head(), summary(), table()).

  • Look at the variables with the information on age (variable AGE), gender (variable SEX), years of education (variable EDYEAR) and first nationality (NAT_1_).

  • Create a crosstab with the variables EDYEAR and SEX.

  • calculate mean and standard deviation for the variable age for men and women.

4.3.4 Solution: Package dplyr

4.4 Bivariate statistics (2 variables)

  • cov(): covariance
  • cor(): Correlation
  • cor(x,y,method="spearman"): Rank correlation
  • ?cor: more information in the helpfile
  • chisq.test(): Chi-square test
  • t.test(): t-test

5 Data visualization

  • Why is data visualization important?

  • Data exploration vs. data presentation

  • Simple diagrams

  • Histograms

  • Scatter charts

  • Bar charts

5.1 Visualization with ggplot2

Hadley Wickham’s ggplot2 package has developed into a particularly useful alternative to plot() over the last few years. Especially complicated plots are easier to implement with ggplot, visually appealing and the code is easily accessible. Most important basic structures:

  • Data that we want to visualize

  • Geometries to define the shapes we want to use for visualization (e.g. a scatter plot, line chart, bar chart)

  • Modify aesthetics to convey different meanings (e.g. colors, size, thickness of a line)

  • Define mappings between geometries and aesthetics (e.g. how big should the data points be)

5.1.1 Example: ggplot2

library(ggplot2)
## Warning: Paket 'ggplot2' wurde unter R Version 4.1.3 erstellt
library(dplyr)
## Warning: Paket 'dplyr' wurde unter R Version 4.1.3 erstellt
## 
## Attache Paket: 'dplyr'
## Die folgenden Objekte sind maskiert von 'package:stats':
## 
##     filter, lag
## Die folgenden Objekte sind maskiert von 'package:base':
## 
##     intersect, setdiff, setequal, union
library(haven)
## Warning: Paket 'haven' wurde unter R Version 4.1.3 erstellt
# set working directory
setwd("S:/MA1082973/_FHNW/BA472")
# load data
shp <- read_sav("data sets/SHP/SHPLONG_P_USER.sav")

# Scatterplot (einfach)
shp2022 <- shp %>%
  filter(YEAR==2022) %>% filter(AGE > 24)

ggplot(data=shp2022, aes(x = PC46, y =PC45)) +
  geom_point() +
  labs(title = "Streudiagramm",
       x = "Weight in kg",
       y = "Height in cm")
## Warning: Removed 3083 rows containing missing values (`geom_point()`).

#
ggplot(data=shp2022, aes(x = PC46, y =PC45, color=factor(SEX))) +
  geom_point() +
  labs(title = "Streudiagramm",
       x = "Weight in kg",
       y = "Height in cm",
       color = "Geschlecht") +
  scale_color_manual(values=c("lightgreen", "darkviolet", "red"),
                    labels=c("Männlich", "Weiblich", "Andere"))
## Warning: Removed 3083 rows containing missing values (`geom_point()`).

# Erstelle ein Säulendiagramm
# Entfernen der NA-Werte für die Variable SEX
shp2022 <- shp2022 %>% filter(!is.na(SEX)) %>% filter(!is.na(PC44))

ggplot(shp2022, aes(x=factor(PC44), y=..count.., fill=factor(SEX))) +
  geom_bar(stat="count", position="dodge") +
  labs(x="Zufriedenheit mit dem Leben", y="Anzahl", fill="Geschlecht") +
  #Beschriftung für Legende
  scale_fill_manual(values=c("lightgreen", "violet", "red"),
                  labels=c("Männlich", "Weiblich", "Andere")) +
  ggtitle("Zufriedenheit mit dem Leben nach Geschlecht")
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## i Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

5.1.2 Exercise: Grafiken mit ggplot2

  1. Load the SHP data.
  2. Create a histogram for income (variable: IPTOTN) for the year 2022. give the graph a title. Tip: Add a labs() argument to the chart.
  3. Create a boxplot showing the distribution of income by educational level (variable: ISCED). In order for the boxplots to be created separately for each education level, you must first save the education variable as a factor (the general form is: data$variable <- as.factor(data$variable)). Also restrict the income (exclude outliers, e.g. all incomes over 200000).

5.1.3 Solution: Grafiken mit ggplot2

5.2 Shiny Apps: General Information

  • Shiny apps are reactive web applications that allow users to modify components of a calculation and/or its visualization without needing to write R code themselves. This is very useful for communicating results.
  • Shiny apps consist of a website (user interface) and a computer running R (R-server): Input() from the user goes to the R-server, and the R-server sends Output() back.
  • Basic structure: R script (named app.R, or ui.R and server.R) containing
    • library(shiny)
    • ui<-fluidPage() with input and output functions. Possible input and output functions can be found here
    • server<-function(input,output){}
    • shinyApp(ui= ui, server=server) (if ui and server are in one script)
  • The R-Studio introduction to Shiny
  • R-Studio offers various solutions for deploying Shiny apps online (free: Shiny Server Open Source, Shinyapps.io)

5.2.1 Example: Package shiny

# Ensure shiny and ggplot2 are installed
# install.packages("shiny")
# install.packages("ggplot2")

library(shiny)
library(ggplot2)

# Example data: a simple dataset related to social work
data <- data.frame(
  age = sample(18:65, 100, replace = TRUE), # Ages between 18 and 65
  satisfaction = sample(1:10, 100, replace = TRUE) # Satisfaction levels from 1 to 10
)

# Define the UI
ui <- fluidPage(
  titlePanel("Data Visualization for Social Work"),
  sidebarLayout(
    sidebarPanel(
      sliderInput("ageRange", 
                  "Select Age Range:", 
                  min = min(data$age), 
                  max = max(data$age), 
                  value = c(25, 40)),
      sliderInput("satisfactionRange", 
                  "Select Satisfaction Range:", 
                  min = min(data$satisfaction), 
                  max = max(data$satisfaction), 
                  value = c(4, 7))
    ),
    mainPanel(
      textOutput("countOutput"),
      plotOutput("ageDistributionPlot") # Add this line to output the plot
    )
  )
)

# Define server logic
server <- function(input, output) {
  filteredData <- reactive({
    data[data$age >= input$ageRange[1] & data$age <= input$ageRange[2] & 
           data$satisfaction >= input$satisfactionRange[1] & data$satisfaction <= input$satisfactionRange[2], ]
  })
  
  output$countOutput <- renderText({
    paste("Number of cases within selected range:", nrow(filteredData()))
  })
  
  output$ageDistributionPlot <- renderPlot({
    ggplot(filteredData(), aes(x = age)) +
      geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
      theme_minimal() +
      labs(title = "Age Distribution of Selected Cases",
           x = "Age",
           y = "Frequency")
  })
}

# Run the app
shinyApp(ui = ui, server = server)

6 Measuring the effects of social work

6.1 Why is it important to measure the effects of social work?

  • Improving practice with better knowledge

    • Understanding whether and how social work interventions and offers reach their goals (evaluation)

    • Knowing the most effective interventions and offers

    • Knowing the most cost-effective interventions and offers

  • Legitimizing social work

    • Gaining political support for social work (i.e. money!)
    • Example:

6.2 What is an effect and what not?

  • Effect = difference in the result with influencing variable versus without influencing variable (= counterfactual situation)

  • What is the counterfactual situation?

    • The story of Tom and Eva

      • “Eva was a saleswoman in a supermarket in Lucerne, who came from a poor family. She dropped out of school after the ninth grade to financially support her parents. In 2005, she met Tom, a mechanic from St. Gallen who worked in a workshop. They fell in love and got married in 2007. They had two children: a son named Max in 2008 and a daughter named Zoe in 2010.

        Eva and Tom had a loving marriage but also their problems. They had to deal with the low income, high rent, and high cost of living. They sometimes argued, but they always talked it out and found a solution. They valued each other and supported each other’s wishes.

        However, in 2017, the relationship between Eva and Tom broke down. Eva discovered that Tom was having an affair with a customer he had met in his workshop. She was angry and hurt and confronted him about it. Tom admitted his infidelity and told her that he no longer loved her. He wanted a divorce and to move out.

        Eva was devastated and agreed to the divorce. They had to arrange custody for their children, who suffered from their parents’ separation. They decided on joint custody, with the children alternating between living with each parent. They also had to divide the assets they had accumulated during their marriage. They sold their apartment in Lucerne and split the proceeds.

        Today, Eva and Tom are divorced and live separately. They have both continued their careers but with less success than before. Eva is still a saleswoman in a small supermarket but has no prospects for promotion or salary increase. Tom is still a mechanic in a small workshop but faces a lot of competition from younger colleagues. They earn less money than before and have trouble paying their bills. They also have less time for themselves and for their children, who are unhappy and insecure. Since 2020, Eva has been supported by social welfare.”

    • The counterfactual version

      • “Eva was a saleswoman in a supermarket in Lucerne, who came from a poor family. She dropped out of school after the ninth grade to financially support her parents. In 2005, she met Tom, a mechanic from St. Gallen who worked in a workshop. They fell in love and got married in 2007. They had two children: a son named Max in 2008 and a daughter named Zoe in 2010.

        Eva and Tom had a loving marriage but also their problems. They had to deal with the low income, high rent, and high cost of living. They sometimes argued, but they always talked it out and found a solution. They valued each other and supported each other’s wishes.

        In 2017, Eva found a job as a cashier in a larger supermarket, while Tom continued to work in the workshop. They earned more money and were able to buy their own apartment. They also had more time for themselves and for their children, who were happy and healthy. Today, Eva and Tom are still happily married and looking forward to the future. They plan to take a trip to Spain soon to enjoy the sun. They also hope to become grandparents one day, as their son Max has a lovely girlfriend whom he has been dating for six months. Their daughter Zoe is still at school, but she already has many talents and interests that she wants to pursue.”

# Create two vectors with the years and the estimates for Version 1
year <- c(2004:2024)
est <- c(4, 5, 6, 7, 8, 7, 8, 8, 7, 7, 8, 7, 8, 4, 3, 3, 2, 2, 2, 2,3)

# Create a table with the vectors as columns
tab1 <- data.frame(year, est, type = "Life Satisfaction")
save <- est

year <- c(2004:2024)
est <- c(4, 5, 6, 7, 8, 7, 8, 8, 7, 7, 8, 7, 8, 9, 9, 9, 8, 9, 9, 9,8)

# Create a table with the vectors as columns
tab2 <- data.frame(year, est, type = "Life Satisfaction")

tab3 <- data.frame(year, est = save - est, type = "Effect of Divorce", version = "Effect of Divorce")

library(ggplot2)

# Add a new column indicating the version
tab1$version <- "Real Situation"
tab2$version <- "Counterfactual Scenario"

# Combine the two tables
tab <- rbind(tab1, tab2, tab3)

# Create a line plot with ggplot
ggplot(tab, aes(x = year, y = est, color = version)) +  
  geom_line() +  
  facet_grid(type ~ ., scales = "free_y") + 
  labs(
    title = "The Effect of Eva's Divorce", 
    subtitle = "Eva's Life Satisfaction in the Actual and Counterfactual World", 
    x = "Year", 
    color = "Effect", 
    y = "Life Satisfaction resp.\nEffect of Divorce on Life Satisfaction"
  ) +  
  geom_vline(xintercept = 2016.5)

  • Exercise
    • Talk to the person sitting next to you.
      • What was the most important event in your life (family, education, work, health, social relationships)?
      • What areas of your life have been affected by this event?
      • What would these areas be like if the event had not happened (can you guess numbers)?
  • Example of measuring effects of social work

6.3 How can we measure the effects of social work with quantitative data?

6.3.1 Exercise

  • Form four groups: one for each method to measure effects

  • Create a social work related example of an effect measure with the respective method. What effect do you measure? What is the outcome measure? What data would you collect?

  • Present the basics of the methodology and the example to your colleagues (at least including the bullet points below)

6.3.2 Asking experts

  • Asking individuals about the subjectively measured effect

  • Example: “On a scale from 0 to 10, how much does one daily glass of wine affect your health?”

  • Advantages

    • Easy to measure: one question

    • Subjective expertise: we know a lot about effects (e.g. pain killers)

  • Disadvantages

    • We are unaware of the counterfactual

    • Social desirability bias: we want to please the researcher

6.3.3 Assessing correlations

  • Is there a systematic relationship between two dimensions?

  • Example: wine consumption and dementia

  • Advantages

    • Easy to measure: few questions
      • Wine consumption
      • Dementia symptoms
  • Disadvantages

    • Often: correlation is not equal to causation
    • Why do frequent wine drinkers show less dementia?

6.3.4 Experiments - the gold standard

  • Advantages:

    • Secure statements on causality

    • Control over treatment

  • Disadvantages:

    • Ethical problems

    • High financial and administrative burden

    • Often limited generalizability

    • Low variance (often only two manifestations: treatment vs. no treatment)

    • Social desirability (except in double-blind studies with placebo)

6.3.5 Natural experiments

  • A random event/dimension (Z) influences independent variable (X) but not the outcome (Y)

  • Advantages:

    • Statements on causality (!)

    • Practically relevant

    • No distortion due to deliberate manipulation

  • Disadvantages:

    • Limited samples: Reproducability?

    • Predetermined treatment

    • Randomness hard to prove

6.4 Analyzing social work experiments in R

6.4.1 Example

#We will analyze the effect of "Richtungswechsel" on long-term recipients' vitality. https://journals.sagepub.com/doi/full/10.1177/10497315241232120

#Download the data: with this command you can download files if you have the direct link to the file
download.file("https://drive.switch.ch/index.php/s/zpjX3z1frPOQKqr/download", "Richtungswechsel.R",mode = "wb")

#Because the file is in R-Data-Format (.R) we can load it directly
load("Richtungswechsel.R")

#To prepare the data for analysis and to analyze the data we will install/load the package "dplyr"
#install.packages("dplyr")
library(dplyr)

#Let's have a look at the Data
View(data.anonym)

#Are there missing values?

summary(data.anonym)

#Vit1 contains measures of vitality before Richtungswechsel, vit2 contains measures of vitality after Richtungswechsel

#We will give the data a nicer name
richtungswechsel <- data.anonym%>%
  filter(!is.na(vit2))%>%#We remove missing values
  mutate(vitality.change=vit2-vit1,#Here we calculate a new column measuring the change in vitality

         Group=ifelse(INT1==1,"Intervention group","Comparison group")) #Here ge give the variable INT1 (Group membership) a nicer, more telling name and labels
#Were there more positive changes in the intervention group?

#Here we check whether the two groups are comparable in terms of 

richtungswechsel%>%
  group_by(Group)%>%#We tell R to do all calculations groupwise 
  summarise(Mean.Age=mean(age),
            Mean.Gender=mean(Geschlecht),
            Mean.Bezugsdauer=mean(Bezugsdauer))#We tell R to summarise all variabels in the data

#Here we calculate the effect of the intervetion
richtungswechsel%>%
  group_by(Group)%>%
  summarise(Mean.Change=mean(vitality.change),
            Mean.Vit1=mean(vit1,na.rm = T),
            Mean.Vit2=mean(vit2,na.rm=T)) #We tell R to calculate the mean value of the change in vitality. We must tell it to remove missing values.

#Was the effect stronger for men or for women?

richtungswechsel%>%
  group_by(Group,Geschlecht=ifelse(Geschlecht==1,"Women","Men"))%>%#We tell R to do all calculations groupwise 
  summarise(Mean.Change=mean(vitality.change,na.rm = T)) #We tell R to calculate the mean value of the change in vitality. We must tell it to remove missing values.


#Tabelle Wirkung abspeichern

library(huxtable)
# Konvertieren Sie die Tabelle in ein Huxtable-Objekt
tabelle_wirkung_hux <- as_hux(tabelle_wirkung)

# Erste Zeile fett und mit Linie abgrenzen
bold(tabelle_wirkung_hux)[1,] <- TRUE
bottom_border(tabelle_wirkung_hux)[1,] <- 1

# Erste Spalte fett formatieren
bold(tabelle_wirkung_hux)[,1] <- TRUE

# Speichern als Word-Dokument
quick_docx(tabelle_wirkung_hux, file = "tabelle_wirkung.docx")

6.4.2 Excercise

  • Download data from Fokus Arbeit. You can find them here.

  • Inspect data with respect to missing values and remove observations with missing values

  • Check the distribution of gender, duration of benefit receipt, age and vitality before the intervention in the intervention and the comparison group. Are the two groups comparable?

  • Calculate a new variable measuring the change in vitality before versus after the intervention.

  • Answer the following questions

    • Are social assistance clients in Biel more vital than those participating in Richtungswechsel?

    • Did Fokus Arbeit increase vitality? Did it more so than in Richtungswechsel?

    • Were the effects different by age, gender, duration of benefit receipt?

7 Prediction in social work

  • Risk prediction in SW

  • Examples in practice

  • Black et al. - 2003 - Is the Threat of Reemployment Services More Effect.pdfBlack et al. - 2003 - Is the Threat of Reemployment Services More Effect.pdf

  • Gilholm et al. - 2023 - Machine learning to predict poor school performanc.pdf

  • Kleinberg et al. - 2018 - Human Decisions and Machine Predictions.pdfKleinberg et al. - 2018 - Human Decisions and Machine Predictions.pdf

  • Rittenhouse et al. - 2023 - Algorithms, Humans and Racial Disparities in Child.pdfRittenhouse et al. - 2023 - Algorithms, Humans and Racial Disparities in Child.pdf

  • Stevenson and Doleac - 2022 - Algorithmic Risk Assessment in the Hands of Humans.pdfStevenson and Doleac - 2022 - Algorithmic Risk Assessment in the Hands of Humans.pdf

  • Tennakoon et al. - 2023 - Using electronic health record data to predict fut.pdfTennakoon et al. - 2023 - Using electronic health record data to predict fut.pdf

  • Toros and Flaming - 2018 - Prioritizing Homeless Assistance Using Predictive .pdfToros and Flaming - 2018 - Prioritizing Homeless Assistance Using Predictive .pdf

  • OLS, (Machine Learning)

  • Organisation: Gruppenformierung

7.1